1. Introduction

On March 17, 2020 President Trump referred to the Coronavirus as the “China Virus.” Shortly after, the number of anti-Chinese incidents started to increase across the United States. One aspect of public health that is often thrown to the wayside is how influential public officials and leaders are in disseminating public health information. Moreover, not only can their words change the public’s views on a health matter but it can also shift a nation’s perspective on someone’s identity. [ADD SOME BACKGROUND LITERATURE THAT BACKS THIS CLAIM UP] In addition, given the influence of identity politics we may expect the term “China Virus” to be more polarizing to certain identities and states. [ADD BACKGROUND LITERATURE + IDENTIFY SUBJECTS OF INTEREST]. There are fewer studies devoted to analyzing these aspects of the pandemic.

Thus,our project aims to explore the relationship and variability of interest in the term “China Virus” across states through political, demographic and COVID-19 characteristics. We ask the question, [INSERT OUR RESERACH QUESTION] ? In order to answer this question, we look to gain a better understanding of [INSERT NAME OF BAYESIAN METHODOLOGY]. Ultimately, we hope our research inspires others to explore and better understand the impact langauge and words have on the public during times of crisis.


2. Data

2.1 Data Descriptions

2.2 Variables of interest

2.3 Visualizations NEED TO WEAVE A COHEASIVE MESSAGE BETWEEN THESE VISUALIZATIONS TO TELL A STORY

2.3a Demographic

Our first visualization is looking at the percent of residents that identify as white within the United States compared to those whom identify as asian. As you can see, there is a higher percent of white identifying residents overall but most specifically in the Midwest and northeast states. In regards to Asian-Americans, there is practically less than .1% per state with the exception of California and New York. From this visualization we can also see that places like Texas, California, and New Mexico have much lower white identifying residents which could provide important information for us in our actual analysis.

2.3b Google China Virus

During the 2020-03-14 - 2020-03-21 week, Trump in an official press announcement labeled the Corona Virus as “China Virus” and we wanted to see how his comments affected search patterns across states.

This plot shows the relationship of “China Virus” search interest over grouped by region. This plots shows that there are certainly key events that trigger an uptick in searches overall. In this plot it is not clear which region may search China Virus more or less often, but it does show a that the regions move together in search interest, which would imply federal level events like a Donald Trump tweet to trigger these interest spikes.

As we can see from the density plot of the China Virus Interest during our time period 2020-03-14 - 2020-03-21, it behaves relatively normal with a small bump at 0. As a group we believe this bump occurs as 0 is the lowest value it can take and because of that limitation of the China Virus Interest we see a small bump around 0. One could argue against a normal distribution as it kinda looks a bit right skewed. But our team believes that a normal distribution is the best at describing the density.

We can see that the variability in Google interest in the term China Virus is has quite a large range between states. There are very few states that have high densities among the upper echelons of the interest scale but there are some interesting peaks of densities among the lower values. For example, we can see that Alaska, Wyoming and Iowa have unusual peaks around the 25-50 range. It is is also interesting interesting to note that there isn’t an obvious mean or median value of China Virus interest among the states.

2.3c COVID-19

This visualisation depicts the distribution of positive COVID-19 cases by region and by which political party won in the 2016 elections. We can see that Democrat states in the Midwest, Mountain, and West have a larger range and higher quantile metrics for positive cases overall. For the Northeast and South regions the mean of positive COVID-19 cases are higher but not significantly. This is an interesting pattern considering that poltical party affiliation appears to interact with the number of postive cases by region.



3. Methods & Models (Feddy and Will)

Model 1 Repeated Measures Model

For our simplest model we decided to use a repeated measures model. Our team decided that the repeated measures model was necessary component because of how our data is set up. As we can see in our dataset, each state has a value for their ChinaVirusInterest for each day in our target period (2020-03-14 - 2020-03-21). Given the ability to use repeated measures and our prior understanding of the varying characteristics (demographic,political,covid-impact) within different states, the repeated measures model allows us to capture these differences in ChinaVirusInterest with the \(\theta_i\) value which represents each state’s mean value.

Model Structure

\[\begin{aligned} Y_{ij}|\theta_i, \mu, \sigma_w, \sigma_b \sim N(\theta_i,\sigma_w^2)\\ \theta_i|\mu,\sigma_b \overset{ind}{\sim} N(\mu, \sigma_b^2)\\ \sigma_b,\sigma_w \sim Exp(...) \end{aligned}\]

\(Y_{ij} = \) ChinaVirusInterest per \(i=State\), and \(j= Day\) \(\theta_i = \) State i’s unique mean value \(\sigma_w = \) within state variation \(\sigma_b = \) between state variation

3.2 Model Descriptions



4. Model Evaluation (RESULTS)

4.1 Model 1 Evaluation (Repeated measures)

In the output above we see that the within deviation is much narrow than the between deviation. This matches our intuition in the model that utilizing the repeated measures, fixed effects model will be able to explain a greater amount of the variation.

This correlation table shows us that there is relatively strong correlation within each given daily observation within a state. Therefore, it is correct for us to utilize a fixed effects model in order to account for these inherent differences between states.

4.2 Model 2 Evaluation (Normal Regression)

4.3 Model 3 Evaluation(Repeated Reg + Normal)

4.4 Model 4 Evaluation (Longitudinal)



5. Results (Will)

5.1 Posterior Predctions All States

5.1.a Table

5.2 Posterior Prediction One State

5.3 Final Model



6.Conclusion

6.1 Limitations

6.2 Future Work



7. Acknowledgments and References